Goto

Collaborating Authors

 hit-like molecule


TextOmics-Guided Diffusion for Hit-like Molecular Generation

Yuan, Hang, Li, Chen, Ma, Wenjun, Jiang, Yuncheng

arXiv.org Artificial Intelligence

Hit-like molecular generation with therapeutic potential is essential for target-specific drug discovery. However, the field lacks heterogeneous data and unified frameworks for integrating diverse molecular representations. To bridge this gap, we introduce TextOmics, a pioneering benchmark that establishes one-to-one correspondences between omics expressions and molecular textual descriptions. TextOmics provides a heterogeneous dataset that facilitates molecular generation through representations alignment. Built upon this foundation, we propose ToDi, a generative framework that jointly conditions on omics expressions and molecular textual descriptions to produce biologically relevant, chemically valid, hit-like molecules. ToDi leverages two encoders (OmicsEn and TextEn) to capture multi-level biological and semantic associations, and develops conditional diffusion (DiffGen) for controllable generation. Extensive experiments confirm the effectiveness of TextOmics and demonstrate ToDi outperforms existing state-of-the-art approaches, while also showcasing remarkable potential in zero-shot therapeutic molecular generation. Sources are available at: https://github.com/hala-ToDi.


Gx2Mol: De Novo Generation of Hit-like Molecules from Gene Expression Profiles via Deep Learning

Li, Chen, Matsukiyo, Yuki, Yamanishi, Yoshihiro

arXiv.org Artificial Intelligence

Abstract-- De novo generation of hit-like molecules is a challenging task in the drug discovery process. Most methods in previous studies learn the semantics and syntax of molecular structures by analyzing molecular graphs or simplified molecular input line entry system (SMILES) strings; however, they do not take into account the drug responses of the biological systems consisting of genes and proteins. In this study we propose a deep generative model, Gx2Mol, which utilizes gene expression profiles to generate molecular structures with desirable phenotypes for arbitrary target proteins. In the algorithm, a variational autoencoder is employed as a feature extractor to learn the latent feature distribution of the gene expression profiles. Then, a long short-term memory is leveraged as the chemical generator to produce syntactically valid SMILES strings that satisfy the feature conditions of the gene expression profile extracted by the feature extractor. Experimental results and case studies demonstrate that the proposed Gx2Mol model can produce new molecules with potential bioactivities and drug-like properties. However, most methods Exploring the chemical space to discover molecules with in the previous studies focused on learning the syntax therapeutic effects (e.g., anticancer drug production) is a and semantics of molecular structures by analyzing molecular time-consuming, costly, and high-risk task in the drug graphs or simplified molecular input line entry system discovery field.


Congratulations to the #AAAI2024 outstanding paper winners

AIHub

The AAAI 2024 outstanding paper awards were announced at the conference on Thursday 22 February. Papers are recommended for consideration during the review process by members of the Program Committee. This year, three papers have been selected as outstanding papers. Abstract: Multi-view learning aims to combine multiple features to achieve more comprehensive descriptions of data. Most previous works assume that multiple views are strictly aligned.